Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Data analysis method for parallel DHP based on Hadoop
YANG Yanxia, FENG Lin
Journal of Computer Applications    2016, 36 (12): 3280-3284.   DOI: 10.11772/j.issn.1001-9081.2016.12.3280
Abstract624)      PDF (830KB)(385)       Save
It is a bottleneck of Apriori algorithm for mining association rules that the candidate set C 2 is used to generate the frequent 2-item set L 2. In the Direct Hashing and Pruning (DHP) algorithm, a generated Hash table H 2 is used to delete the unused candidate item sets in C 2 for improving the efficiency of generating L 2. However,the traditional DHP is a serial algorithm, which cannot effectively deal with large scale data. In order to solve the problem, a DHP parallel algorithm, termed H_DHP algorithm, was proposed. First, the feasibility of parallel strategy in DHP was analyzed and proved theoretically. Then, the generation method for the Hash table H 2 and frequent item sets L 1, L 3- L k was developed in parallel based on Hadoop, and the association rules were generated by Hbase database. The simulation experimental results show that, compared with the DHP algorithm, the H_DHP algorithm has better performance in the processing efficiency of data, the size of the data set, the speedup and scalability.
Reference | Related Articles | Metrics